Goto

Collaborating Authors

 matrix factorization model




Breaking the Cold-Start Barrier: Reinforcement Learning with Double and Dueling DQNs

arXiv.org Artificial Intelligence

Recommender systems struggle to provide accurate suggestions to new users with limited interaction history, a challenge known as the cold-user problem. This paper proposes a reinforcement learning approach using Double and Dueling Deep Q-Networks (DQN) to dynamically learn user preferences from sparse feedback, enhancing recommendation accuracy without relying on sensitive demographic data. By integrating these advanced DQN variants with a matrix factorization model, we achieve superior performance on a large e-commerce dataset compared to traditional methods like popularity-based and active learning strategies. Experimental results show that our method, particularly Dueling DQN, reduces Root Mean Square Error (RMSE) for cold users, offering an effective solution for privacy-constrained environments.


Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

Neural Information Processing Systems

Matrix factorization models have been extensively studied as a valuable test-bed for understanding the implicit biases of overparameterized models. Although both low nuclear norm and low rank regularization have been studied for these models, a unified understanding of when, how, and why they achieve different implicit regularization effects remains elusive. In this work, we systematically investigate the implicit regularization of matrix factorization for solving matrix completion problems. We empirically discover that the connectivity of observed data plays a key role in the implicit bias, with a transition from low nuclear norm to low rank as data shifts from disconnected to connected with increased observations. We identify a hierarchy of intrinsic invariant manifolds in the loss landscape that guide the training trajectory to evolve from low-rank to higher-rank solutions.


Reviews: Implicit Regularization in Deep Matrix Factorization

Neural Information Processing Systems

This paper studies the implicit regularization of gradient descent over deep neural networks for deep matrix factorization models. The paper begins with a review of prior work regarding how running gradient descent on a shallow matrix factorization model, with small learning rate and initialization close to zero, tends to converge to solutions that minimize the nuclear norm [20] (Conjecture 1). This discussion is then extended to deep matrix factorization, where predictive performance improves with depth when the number of observed entries is small. Experimental results (Figure 2) which challenge Conjecture 1 are then presented, which indicate that implicit regularization in both shallow and deep matrix factorization converges to low-rank solutions, rather than minimizing nuclear norm, when few entries are observed. Finally, a theoretical and experimental analysis of the dynamics of gradient flow for deep matrix factorization is presented, which shows how singular values and singular vectors of the product matrix evolve during training, and how this leads to implicit regularization that induces low-rank solutions.


Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data

arXiv.org Machine Learning

Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level, enabling the study of cellular heterogeneity and gene expression dynamics. Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples. This step, often performed using principal component analysis or closely related methods, is challenging because of the size and complexity of the data. In this work, we present a generalized matrix factorization model assuming a general exponential dispersion family distribution and we show that many of the proposed approaches in the single-cell dimensionality reduction literature can be seen as special cases of this model. Furthermore, we propose a scalable adaptive stochastic gradient descent algorithm that allows us to estimate the model efficiently, enabling the analysis of millions of cells. Our contribution extends to introducing a novel warm start initialization method, designed to accelerate algorithm convergence and increase the precision of final estimates. Moreover, we discuss strategies for dealing with missing values and model selection. We benchmark the proposed algorithm through extensive numerical experiments against state-of-the-art methods and showcase its use in real-world biological applications. The proposed method systematically outperforms existing methods of both generalized and non-negative matrix factorization, demonstrating faster execution times while maintaining, or even enhancing, matrix reconstruction fidelity and accuracy in biological signal extraction. Finally, all the methods discussed here are implemented in an efficient open-source R package, sgdGMF, available at github/CristianCastiglione/sgdGMF


Subspace-Constrained Quadratic Matrix Factorization: Algorithm and Applications

arXiv.org Artificial Intelligence

Matrix Factorization has emerged as a widely adopted framework for modeling data exhibiting low-rank structures. To address challenges in manifold learning, this paper presents a subspace-constrained quadratic matrix factorization model. The model is designed to jointly learn key low-dimensional structures, including the tangent space, the normal subspace, and the quadratic form that links the tangent space to a low-dimensional representation. We solve the proposed factorization model using an alternating minimization method, involving an in-depth investigation of nonlinear regression and projection subproblems. Theoretical properties of the quadratic projection problem and convergence characteristics of the alternating strategy are also investigated. To validate our approach, we conduct numerical experiments on synthetic and real-world datasets. Results demonstrate that our model outperforms existing methods, highlighting its robustness and efficacy in capturing core low-dimensional structures.


Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

arXiv.org Artificial Intelligence

Matrix factorization models have been extensively studied as a valuable test-bed for understanding the implicit biases of overparameterized models. Although both low nuclear norm and low rank regularization have been studied for these models, a unified understanding of when, how, and why they achieve different implicit regularization effects remains elusive. In this work, we systematically investigate the implicit regularization of matrix factorization for solving matrix completion problems. We empirically discover that the connectivity of observed data plays a crucial role in the implicit bias, with a transition from low nuclear norm to low rank as data shifts from disconnected to connected with increased observations. We identify a hierarchy of intrinsic invariant manifolds in the loss landscape that guide the training trajectory to evolve from low-rank to higher-rank solutions. Based on this finding, we theoretically characterize the training trajectory as following the hierarchical invariant manifold traversal process, generalizing the characterization of Li et al. (2020) to include the disconnected case. Furthermore, we establish conditions that guarantee minimum nuclear norm, closely aligning with our experimental findings, and we provide a dynamics characterization condition for ensuring minimum rank. Our work reveals the intricate interplay between data connectivity, training dynamics, and implicit regularization in matrix factorization models.


AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction

arXiv.org Artificial Intelligence

Predicting opinion trends on a range of social issues, from climate change to gay marriage, is crucial for making informed decisions, tracking social changes, and understanding the dynamics of opinion formation (Brooks and Manza, 2006; Burstein, 2003). Recently, numerous breakthroughs have been made to infer and predict people's opinions and preferences from their written records, such as books in the past (e.g., Google Ngram), internet search patterns (e.g., Google Trend), and public sentiments in social media (e.g., Twitter, Facebook, YouTube) (Beauchamp, 2017; Grimmer et al., 2022; Moore et al., 2019; O'Connor et al., 2010; Stephens-Davidowitz, 2017). However, using digital trace data for predicting public opinion presents a substantial challenge, as these "proxy" measures cannot be deemed reliable without validating them against other "ground truth" benchmarks, like surveys (Beauchamp, 2017; Ferraro and Farmer, 1999). Even if digital trace data can closely track public opinion trends, its unobtrusive and anonymous nature prompts questions about its ability to truly represent the diverse voices of the population, particularly considering the skewed representation of demographic groups in digital traces (Cesare et al., 2018). The reliance on digital trace data, despite covering a broad spectrum of opinions, makes it hard to evenly represent the real voice of the entire population.


Optimistic Estimate Uncovers the Potential of Nonlinear Models

arXiv.org Artificial Intelligence

We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.